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MARKED-UP VERSION 



\m METHOD OF REFLECTING TIME/LANGUAGE DISTORTION IN 
SI OBJECTIVE SPEECH QUALITY ASSESSMENT 



Field of the Invention 



5 



The present invention relates generally to communications systems and, in 



particular, to speech quality assessment. 

Background of the Related Art 

Performance of a wireless communication system can be measured, 

1 0 among other things, in terms of speech quality. In the current art, there are two 

techniques of speech quality assessment. The first technique is a subjective technique 
(hereinafter referred to as "subjective speech quality assessment"). In subjective speech 
quality assessment, human listeners are typically used to rate the speech quality of 
processed speech, wherein processed speech is a transmitted speech signal which has 

1 5 been processed at the receiver. This technique is subjective because it is based on the 
perception of the individual human, and human assessment of speech quality by native 
listeners, i.e., people that speak the language of the speech material being presented or 
listened, typically takes into account language effects. Studies have shown that a 
listener's knowledge of language affects the scores in subjective listening tests. Scores 

20 given by native listeners were lower in subjective listening tests compared to scores given 
by non-native listeners when language information in speech is defect, i.e., mute. In a 
normal telephone conversation, the listener is often a native listener. Thus, it is 
preferable to use native listeners for subjective speech quality assessment in order to 
emulate typical conditions. Subjective speech quality assessment techniques provide a 

25 good assessment of speech quality but can be expensive and time consuming. 



"objective speech quality assessment"). Objective speech quality assessment is not based 
on the perception of the individual human. Some objective speech quality assessment 
techniques are based on known source speech or reconstructed source speech estimated 
30 from processed speech. Other objective speech quality assessment techniques are not 

based on known source speech but on processed speech only. These latter techniques are 



The second technique is an objective technique (hereinafter referred to as 
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referred to herein as "single-ended objective speech quality assessment techniques" and 
are often used when known source speech or reconstructed source speech are unavailable. 

Current single-ended objective speech quality assessment techniques, 
however, do not provide as good an assessment of speech quality compared to subjective 
5 speech quality assessment techniques. One reason why current single-ended objective 
speech quality assessment techniques are not as good as subjective speech quality 
assessment techniques is because the former techniques do not account for language 
effects. Current single-ended objective speech quality assessment techniques have been 
unable to account for language effects in its speech assessment. 
10 Accordingly, there exists a need for a single-ended objective speech 

quality assessment technique which accounts for language effects in assessing speech 
quality. 

Summary of the Invention 

15 The present invention is an objective speech quality assessment technique 

that reflects the impact of distortions which can dominate overall speech quality 
assessment by modeling the impact of such distortions on subjective speech quality 
assessment, thereby, accounting for language effects in objective speech quality 
assessment. In one embodiment, the objective speech quality assessment technique of the 

20 present invention comprises the steps of detecting distortions in an interval of speech 
activity using envelope information, and modifying an objective speech quality 
assessment value associated with the speech activity to reflect the impact of the 
distortions on subjective speech quality assessment. In one embodiment, the objective 
speech quality assessment technique also distinguish types of distortions, such as short 

25 bursts, abrupt stops and abrupt starts, and modifies the objective speech quality 

assessment values to reflect the different impacts of each type of distortion on subjective 
speech quality assessment. 
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Brief Description of the Drawings 

The features, aspects, and advantages of the present invention will become 
better understood with regard to the following description, appended claims, and 
accompanying drawings where: 
5 Fig. 1 depicts a flowchart illustrating an objective speech quality assessment 

technique accounting for language effects in accordance with one embodiment of the 
present invention; 

Fig. 2 depicts a flowchart illustrating a voice activity detector (VAD) which 
detects voice activity by examining envelope information associated with the speech 
10 signal in accordance with one embodiment of the present invention; 

Fig. 3 depicts an example VAD activity diagram illustrating intervals T and G of 
speech and non-speech activities, respectively; 

Fig. 4 depicts a flowchart illustrating an embodiment for determining whether 
speech activity is a short burst or impulsive noise and for modifying objective speech 
15 frame quality assessment v s (m) when a short burst or impulsive noise is determined; 

Fig. 5 depicts a flowchart illustrating an embodiment for determining whether 
speech activity has an abrupt stop or mute and for modifying objective speech frame 
quality assessment v s (m) when it is determined that such speech activity has an abrupt 
stop or mute; and 

20 Fig. 6 depicts a flowchart illustrating an embodiment for determining whether 

speech activity has an abrupt start and for modifying objective speech frame quality 
assessment v s (m) when it is determined that such speech activity has an abrupt start. 



Detailed Description 

25 The present invention is an objective speech quality assessment technique 

that reflects the impact of distortions which can dominate overall speech quality 
assessment by modeling the impact of such distortions on subjective speech quality 
assessment, thereby, accounting for language effects in objective speech quality 
assessment. 

30 Fig. 1 depicts a flowchart 100 illustrating an objective speech quality 

assessment technique accounting language effects in accordance with one embodiment of 
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the present invention. In step 102, speech signal s(n) is processed to determine objective 
speech frame quality assessment v s (m), i.e., objective quality of speech at frame m. In 
one embodiment, each frame m corresponds to a 64 ms interval. The manner of 
processing a speech signal s(n) to obtain objective speech frame quality assessment v s (m) 
5 (which do not account for language effects) is well-known in the art. One example of 
such processing is described in co-pending application serial number 10/186,862, entitled 
"Compensation Of Utterance-Dependent Articulation For Speech Quality Assessment", 
filed on July 01, 2002 by inventor Doh-Suk Kim, attach e d h e r e in as App e ndix A which is 
being incorporated herein by reference . 
10 In step 105, speech signal s(n) is analyzed for voice activity by, for 

example, a voice activity detector (VAD). VADs are well-known in the art. Fig. 2 
depicts a flowchart 200 illustrating a VAD which detects voice activity by examining 
envelope information associated with the speech signal in accordance with one 
embodiment of the present invention. In step 205, envelope signals yrfn) are summed up 

15 for all cochlear channels k to form summed envelope signal y(n) in accordance with 
equation (1): 

y(n) = Y d y k (n) equation (1) 



where y k (n) = yjs 2 k (n) + sl(n) , n represents a time index, N C b represents a total number of 

critical bands, Sk(n) represents the output of speech signal s(n) through cochlear channel 
20 k, i.e., s k (n) - s(n) * h k (n) , and s k (n) is the Hilbert transform of Sk(n). 

In step 210, a frame envelope e(l) is computed every 2 ms by multiplying 
summed envelope signal y(n) with a 4 ms Hamming window w(n) in accordance with 
equation (2): 



e{l) = log 



31 



Xr (/) («M«)+i 



equation (2) 



n=0 

25 where y {l \n) is the 2 ms /-th frame signal of the summed envelope signal y(n). It should 

be understood that the durations of the frame envelope e(l) and Hamming window w(n) 
are merely illustrative and that other durations are possible. In step 215, a flooring 
operation is applied to frame envelope e(l) in accordance with equation (3). 
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e{l) = 



e(l) ife(l)>5 



equation (3) 



5 otherwise 



In step 220, time derivative Ae(l) of floored frame envelope e(l) is obtained in 



accordance with equation (4). 



3 




5 



where -3<y<3. 



In step 225, voice activity detection is performed in accordance with 



equation (5). 



vad(l) = 



1 ife(l)>5 



equation (5) 



0 otherwise 



In step 230, the result of equation (5), i.e., vad(l) y can then be refined based on the 
10 duration of 1 's and 0's in the output. For example, if the duration of 0's in vad(l) is 

shorter than 8 ms, then vad(l) shall be changed to Fs for that duration. Similarly, if the 
duration of Fs in vad(l) is shorter than 8 ms, the vad(l) shall be changed to 0's for that 
duration. Fig. 3 depicts an example VAD activity diagram 30 illustrating intervals T and 
G of speech and non-speech activities, respectively. It should be understood that speech 
15 activities associated with intervals T may include, for example, actual speech, data or 
noise. 



speech activity, interval T is examined to determine whether the associated speech 
activity corresponds to a short burst or impulsive noise in step 110. If the speech activity 
20 in interval T is determined to be a short burst or impulsive noise, then objective speech 
frame quality assessment v s (m) is modified in step 1 15 to obtain a modified objective 
speech frame quality assessment v s (m) . The modified objective speech frame quality 

assessment v s (m) accounts for the effects of short burst or impulsive noise by modeling 

or simulating the impact of short bursts or impulsive noise on subjective speech quality 
25 assessment. 



Returning to flowchart 100 of Fig. 1, upon analyzing speech signal s(n) for 
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From step 1 15 of if in step 1 10 the speech activity in interval T is not 
determined to be a short burst or impulsive noise, then flowchart 100 proceeds to step 
120 where the speech activity in interval T is examined to determine whether it has an 
abrupt stop or mute. If the speech activity in interval T is determined to have an abrupt 
5 stop or mute, then objective speech frame quality assessment v s (m) is modified in step 
125 to obtain a modified objective speech frame quality assessment v s (m) . The modified 

objective speech frame quality assessment v s (m) accounts for the effects of the abrupt 

stop or mute by modeling or simulating the impact of an abrupt stop or mute and 
subsequent release on subjective speech quality assessment. 

10 From step 125 or if in step 120 the speech activity in interval T is not 

determined to have an abrupt stop or mute, then flowchart 100 proceeds to step 130 
where the speech activity in interval T is examined to determine whether it has an abrupt 
start. If the speech activity in interval T is determined to have an abrupt start, then 
objective speech frame quality assessment v s (m) is modified in step 135 to obtain a 

1 5 modified objective speech frame quality assessment v s (m) . The objective speech frame 

quality assessment v s (m) accounts for the effects of the abrupt start by modeling or 
simulating the impact of an abrupt start on subjective speech quality assessment. From 
step 135 or if in step 130 the speech activity in interval T is not determined to have an 
abrupt start, then flowchart 100 proceeds to step 145 where the results of modifications to 

20 objective speech frame quality assessment v s (m) 9 if any, are integrated into the original 
objective speech frame quality assessment v s (m) of step 102. 

Techniques for determining whether speech activity is a short burst (or 
impulsive noise) or has an abrupt stop (or mute) or an abrupt start, i.e., steps 110, 120 and 
130, along with techniques for modifying objective speech frame quality assessment 

25 v s (m) 9 i.e., steps 115, 125 and 135, in accordance with one embodiment of the invention 
will now be described. Fig. 4 depicts a flowchart 400 illustrating an embodiment for 
determining whether speech activity is a short burst or impulsive noise and for modifying 
objective speech frame quality assessment v s (m) when a short burst or impulsive noise is 
determined. In step 405, an impulsive noise frame // is determined by finding a frame / in 

30 interval T, where frame envelope e(l) is maximum in accordance, for example, with 
equation (6): 
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I j = arg max e(l) equation (6) 

where w, and J, represents frames / at the beginning and end of interval T„ respectively. 
In step 410, frame envelope e(lj) is compared to a listener threshold value indicating 
whether a human listener can consider the corresponding frame // as annoying short burst. 
5 In one embodiment, the listener threshold value is 8 — that is, in step 410, e(lj) is checked 
to determine whether it is greater than 8. If frame envelope e(lj) is not greater than the 
listener threshold value, then in step 415 the speech activity is determined not to be a 
short burst or impulsive noise. 

If frame envelope e(lj) is greater than the listener threshold value, then in 

10 step 420 the duration of interval T,- is checked to determine whether it satisfies both a 
short burst threshold value and a perception threshold value. That is, interval T,- is being 
checked to determine whether interval T, is not too short to be perceived by a human 
listener and not too long to be categorized as a short burst. In one embodiment, if the 
duration of interval T, is greater than or equal to 28 ms and less than or equal to 60 ms, 

15 i.e., 28<T/<60, then both of the threshold values of step 420 are satisfied. Otherwise the 
threshold values of step 420 are not satisfied. If the threshold values of step 420 are not 
satisfied, then in step 425 the speech activity is determined not to be a short burst or 
impulsive noise. 

If the threshold values of step 420 are satisfied, then in step 430 a 
20 maximum delta frame envelope ke(l) is determined from the frame envelopes e(l) in the 
one or more frames prior to the beginning of interval T/ through the first one or more 
frames of interval T,- and subsequently compared to an abrupt change threshold value, 
such as 0.25. The abrupt change threshold value representing a criteria for identifying an 
abrupt change in the frame envelope. In one embodiment, a maximum delta frame 
25 envelope £±e(l) is determined from frame envelope e(u r l), i.e., frame envelope 

immediately preceding interval T,-, through the frame envelope e(uj+5), i.e., fifth frame 
envelope in interval T„ and compared to a threshold value of 0.25 that is, in step 430, it 
is checked to determine whether equation (7) is satisfied: 

max Ae(l) > 0.25 equation (7) 
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If the maximum delta frame envelope ke(l) does not exceed the threshold value, then in 
step 435 the speech activity is determined not to be a short burst or impulsive noise. 

If the maximum delta frame envelope Ae(l) does exceed the threshold 
value, then in step 440 it is determined whether frame mj would be sufficiently annoying 
5 to a human listener, where m/ corresponds to the frame m which is impacted most by 
impulsive noise frame //.In one embodiment, step 440 is achieved by determining 
whether a ratio of objective speech frame quality assessment v s (m^ to modulation noise 
reference unit v g (m^) exceeds a noise threshold value. Step 440 may be expressed, for 
example, using a noise threshold value of 1.1 and equation (8): 

10 V *K) <1 j equation (8) 

wherein if equation (8) is satisfied, it would be determined that frame mi has sufficient 
annoyance to a human listener. If it is determined that objective speech frame quality 
assessment v s (m]) would be sufficiently annoying to a human listener, then in step 445 the 
speech activity is determined not to be a short burst or impulsive noise. 

15 If it is determined that objective speech frame quality assessment v s (mj) 

would not be sufficiently annoying to a human listener, then in step 450 conditions 
related to the durations of intervals Gm,i, Gu+u Tm and/or T,+i satisfying certain 
minimum or maximum duration threshold values are checked to verify that it belongs to 
human speech. In one embodiment, the conditions of step 450 are expressed as equations 

20 (9) and (10). 

Gm,,- < 180 ms and G,, ,+i > 40 ms and Tm > 50 ms equation (9) 
Gi-u > 40 ms and G,, ,+i < 100 ms and T,+i > 60 ms equation (10) 
If any of these equations or conditions are satisfied, then in step 455 the speech activity is 
determined not to be a short burst or impulsive noise. Rather the speech activity is 

25 determined to be natural speech. It should be understood that the minimum and 

maximum duration threshold values used in equations (9) and (10) are merely illustrative 
and may be different. 

If none of the conditions in step 450 are satisfied, then in step 460 
objective speech frame quality assessment v s (m) is modified in accordance with equation 

30 11: 
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v,(m) = 





Fig. 5 depicts a flowchart 500 illustrating an embodiment for determining 



whether speech activity has an abrupt stop or mute and for modifying objective speech 
frame quality assessment v s (m) when it is determined that such speech activity has an 

5 abrupt stop or mute. In step 505, abrupt stop frame l M is determined. The abrupt stop 
frame Im is determined by first finding negative peaks of delta frame envelope Ae(l) in the 
speech activity using all frames / in interval T,-. Delta frame envelope Ae(l) has a 
negative peak at / if Ae(l) < Ae(l+j) for 3 <j < 3. Upon finding the negative peaks, abrupt 
stop frame Im is determined as the minimum of the negative peaks of delta frame 

10 envelopes Ae(l). In step 510, delta frame envelope Ae(ltf) is checked to determined 
whether an abrupt stop threshold value is satisfied. The abrupt stop threshold 
representing a criteria for determining whether there was sufficient negative change in 
frame envelope from one frame / to another frame /+1 to be considered an abrupt stop. In 
one embodiment, the abrupt stop threshold value is -0.56 and step 510 may be expressed 

15 as equation (12): 



If delta frame envelope Aeflj^) does not satisfy the abrupt stop threshold value, then in 
step 515 the speech activity is determined not to have an abrupt stop or mute. 



20 then in step 520 interval T, is checked to determine if the speech activity is of sufficient 
duration, e.g., longer than a short burst. In one embodiment, the duration of interval T, is 
checked to see if it exceeds the duration threshold value, e.g., 60 ms. That is, if T, < 60 
ms, then the speech activity associated with interval T, is not of sufficient duration. If the 
speech activity is considered not of sufficient duration, then in step 525 the speech 

25 activity is determined not to have an abrupt stop or mute. 



a maximum frame envelope e(l) is determined for one or more frames prior to frame Im 
through frame Im or beyond and subsequently compared against a stop-energy threshold 
value. The stop-energy threshold value representing a criteria for determining whether a 



Ae(l M )<-0.56 



equation (12) 



If delta frame envelope Ac(Im) does satisfy the abrupt stop threshold value, 



If the speech activity is considered of sufficient duration, then in step 530 
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frame envelope has sufficient energy prior to muting. In one embodiment, maximum 
frame envelope e(l) is determined for frames Isrl through l M and compared to a stop- 
energy threshold value of 9.5, i.e., max e(l) > 9.5 . If the maximum frame envelope e(l) 

does not satisfy the stop-energy threshold value, then in step 535 the speech activity is 
determined not to have an abrupt stop or mute. 

If the maximum frame envelope e(l) does satisfy the stop-energy threshold 
value, then objective speech frame quality assessment v s (m) is modified in accordance 
with equation 13 for several frames m, such as m M , ...,m M +& 



v s (m) = \Ae(l M )\ 



-6 



equation (13) 



1 4- exp [~2(m - m M - 3] 

10 where mu corresponds to the frame m which is impacted most by abrupt stop frame Im- 

Fig. 6 depicts a flowchart 600 illustrating an embodiment for determining 
whether speech activity has an abrupt start and for modifying objective speech frame 
quality assessment v s (m) when it is determined that such speech activity has an abrupt 
start. In step 605, abrupt start frame Is is determined. The abrupt start frame Is is 

15 determined by first finding positive peaks of delta frame envelope Ae(l) in the speech 

activity using all frames / in interval T,-. Delta frame envelope Ae(l) has a positive peak at 
/ if Ae(l) > Ae(l+j) for 3 <j < 3. Upon finding the positive peaks, abrupt start frame Is is 
determined as the maximum of the positive peaks of delta frame envelopes Ae(l). In step 
610, delta frame envelope Ae(ls) is checked to determined whether an abrupt start 

20 threshold value is satisfied. The abrupt start threshold representing a criteria for 

determining whether there was sufficient positive change in frame envelope from one 
frame / to another frame /+1 to be considered an abrupt start. In one embodiment, the 
abrupt stop threshold value is 0.9 and step 610 may be expressed as equation (14): 

Ae(l s )>0,9 equation (14) 

25 If delta frame envelope Ae(l$) does not satisfy the abrupt start threshold value, then in 
step 615 the speech activity is determined not to have an abrupt start. 

If delta frame envelope Ae(l$) does satisfy the abrupt start threshold value, 
then in step 620 interval T,- is checked to determined if the speech activity is of sufficient 
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duration, e.g., longer than a short burst. In one embodiment, the duration of interval T, is 
checked to see if it exceeds the short burst threshold value, e.g., 60 ms. That is, if T, < 60 
ms, then the speech activity associated with interval T ( is not of sufficient duration. If the 
speech activity is not of sufficient duration, then in step 625 the speech activity is 
5 determined not to have an abrupt start. 

If the speech activity is of sufficient duration, then in step 630 a maximum 
frame envelope e(l) is determined for frame Is or prior through one or more frames after 
frame Is and subsequently compared against a start-energy threshold value. The start- 
energy threshold value representing a criteria for determining whether a frame envelope 
10 has sufficient energy. In one embodiment, maximum frame envelope e(l) is determined 
for frames Is through Is +7 and compared to a start-energy threshold value of 12, i.e., 
max e(l) < 12 . If the maximum frame envelope e(l) does not satisfy the start-energy 

threshold value, then in step 635 the speech activity is determined not to have an abrupt 
start. 

15 If the maximum frame envelope e(l) does satisfy the start-energy threshold 

value, then objective speech frame quality assessment v s (m) is modified in accordance 
with equation 16 for several frames m, such as thm, ...,iwji/+6: 

v (m) = = v s( m ) equation (16) 

,v l + exp[-0.4(/w-m 5 )/Ae(/ 5 )-10] 

where m s corresponds to the frame m which is impacted most by abrupt start frame 
20 It should be understood that the values used in equations (1 1), (13) and (16) were derived 
empirically. Other values are possible. Thus, the present invention should not be limited 
to those specific values. 

Note that upon determining modified objective speech frame quality 

assessment v s (m), the integration performed in step 145 maybe achieved using equation 
25 (17): 

v 5 (m) = min(v sl (m), v sM (m), v gJS (m)) equation (17) 

where v s j(m), v S M( m ) an ^ v s.s( m ) correspond to the modified objective speech frame 
quality assessment v s (m) of equations 11, 13 and 16, respectively. 
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Although the present invention has been described in considerable detail 
with reference to certain embodiments, other versions are possible. For example, the 
orders of the steps in the flowcharts may be re-arranged, or some steps (or criteria) may 
be deleted from or added to the flowcharts. Therefore, the spirit and scope of the present 
invention should not be limited to the description of the embodiments contained herein. 
It should also be understood to those skilled in the art that the present invention may be 
implemented either as hardware or software incorporated into some type of processor. 
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Claims 
I claim: 

1 . A method for objectively assessing speech quality comprising the steps of: 

detecting distortions in an interval of speech activity using envelope 
5 information; and 

modifying an objective speech quality assessment value associated with 
the speech activity to reflect the impact of the distortions on subjective speech 
quality assessment. 



10 2. The method of claim 1 , wherein the step of modifying includes the step of 

determining the objective speech quality assessment values for the speech 
activity. 



3. The method of claim 1, wherein the distortions being detected are impulsive noise, 
1 5 abrupt stop or abrupt start. 



4. The method of claim 1 , wherein the step of detecting includes the step of 
determining a distortion type. 



20 5. The method of claim 4, wherein the distortion type is determined to be impulsive 

noise if the envelope information indicates that the speech activity can be 
perceived by a human listener to be noise and if the interval is of a duration long 
enough to be perceived by a human listener but not too long for a short burst. 



25 6. The method of claim 4, wherein the distortion type is determined to be impulsive 

noise if the envelope information indicates that the speech activity can be 
perceived by a human listener to be noise, if a ratio of the objective speech quality 
assessment value to a modulation noise reference unit indicates a human listener 
would perceive annoying noise, and if the interval is of a duration long enough to 

30 be perceived by a human listener but not too long for a short burst. 
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The method of claim 4, wherein the objective quality assessment value associated 
with the speech activity is modified in accordance with the following equation to 
obtain a modified objective quality assessment value if the distortion type is 
impulsive noise: 

~ ( x v 5 (m) 

v (m) = — — 

1 + exp [-8 2{m - m t ) / <?(/, ) - 1 0] 
where v s (m) is the objective quality assessment value and v s (m) is the modified 
objective quality assessment value. 

The method of claim 4, wherein the distortion type is determined to be abrupt stop 
if the envelope information indicates that there was an sufficient negative change 
in frame energy from one frame to another to be considered an abrupt stop and if 
the interval is of a duration longer than a short burst. 

The method of claim 4, wherein the distortion type is determined to be abrupt stop 
if the envelope information indicates that a maximum frame envelope had 
sufficient energy prior to ending the interval, and if the interval is of a duration 
longer than a short burst. 



The method of claim 4, wherein the objective quality assessment value associated 
with the speech activity is modified in accordance with the following equation to 
obtain a modified objective quality assessment value if the distortion type is 



impulsive noise: 



v s (m) = \Ae(l M )\ 



1 + exp [-2(m - m M - 3] 



-6 



where v s (m) is the objective quality assessment value and v s (m) is the modified 



objective quality assessment value. 



The method of claim 4, wherein the distortion type is determined to be abrupt start 
if the envelope information indicates that there was an sufficient positive change 
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20 



in frame energy from one frame to another to be considered an abrupt start and if 
the interval is of a duration longer than a short burst. 



12. The method of claim 4, wherein the distortion type is determined to be abrupt stop 
if the envelope information indicates that a maximum frame envelope had 
sufficient energy towards a beginning of the interval, and if the interval is of a 
duration longer than a short burst. 



13. The method of claim 4, wherein the objective quality assessment value associated 
10 with the speech activity is modified in accordance with the following equation to 

obtain a modified objective quality assessment value if the distortion type is 
impulsive noise: 

v (m) = — — 

l + exp[-0A(m-m s )/ Ae(l s )-10] 

where v s (m) is the objective quality assessment value and v s (m) is the modified 
1 5 obj ecti ve quality assessment value. 



14. The method of claim 1 comprising the additional step of: 

prior to the step of detecting, determining the interval of speech activity 
using the envelope information. 



15. An obj ective speech quality assessment system comprising: 

means for detecting distortions in an interval of speech activity using 

* 

envelope information; and 

means for modifying an objective speech quality assessment value 
25 associated with the speech activity to reflect the impact of the distortions on 

subjective speech quality assessment. 



16. The objective speech quality assessment system of claim 15, wherein the means 
for modifying includes a means for determining the objective speech quality 
30 assessment values without accounting for distortions for the speech activity. 



15 



Kim 4 



The objective speech quality assessment system of claim 15, wherein the 
distortions being detected are impulsive noise, abrupt stop or abrupt start. 

The objective speech quality assessment system of claim 15, wherein the means 
for detecting includes a means for determining a distortion type. 

The objective speech quality assessment system of claim 18, wherein the means 
for detecting includes a voice activity detector for detecting intervals of speech 
activity, wherein the means for determining a distortion type examines intervals of 
speech activities detected by the voice activity detector. 
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Abstract of the Disclosure 

Disclosed is an objective speech quality assessment technique that reflects 
the impact of distortions which can dominate overall speech quality assessment by 
modeling the impact of such distortions on subjective speech quality assessment, thereby, 
5 accounting for language effects in objective speech quality assessment. 
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COMPENSATION FOR UTTERANCE DEPENDENT ARTICULATION 

FOR SPEECH QUALITY ASSESSMENT 

Field of the Inv e ntion 

5 Th e pr e s e nt invention relates g e n e rally to communications syst e ms and, in 

particular, to spe e ch quality ass e ssm e nt. 

Bae kground of the R e lat e d Art 

P e rformance of a wir e l e ss communication syst e m can bo m e asur e d, 

10 among other things, in t e rms of sp ee ch quality. In th e curr e nt art, th e re ar e two 

t e chniqu e s of spe e ch quality ass e ssm e nt. The first t e chniqu e is a subj e ctiv e t e chniqu e 
(her e inaft e r r e f e rr e d to as "subj e ctiv e sp ee ch quality ass e ssm e nt"). In subj e ctive sp ee ch 
quality ass e ssment, human list e n e rs ar e us e d to rat e th e sp ee ch quality of proc e ss e d 
spe e ch, wherein proc es s e d sp ee ch is a transmitt e d sp e ech signal which has b ee n 

15 processed at th e r e ceiv e r. This t e chnique is subj e ctiv e b e caus e it is bas e d on the 

p e rc e ption of th e individual human, and human ass e ssm e nt of sp ee ch quality typically 
takes into account phon e tic cont e nts, sp e aking styles or individual sp e ak e r diff e r e nc e s. 
Subj e ctiv e sp ee ch quality ass e ssm e nt can b e e xp e nsiv e and tim e consuming. 
The s e cond t e chniqu e is an obj e ctive t e chniqu e (h e r e inafter r e f e rr e d to as 

20 "obj e ctiv e sp ee ch quality assessm e nt"). Obj e ctiv e sp ee ch quality ass e ssm e nt is not bas e d 
on th e perc e ption of th e individual human. Most objectiv e sp ee ch quality ass e ssm e nt 
t e chniqu e s are bas e d on known sourc e sp ee ch or r e construct e d sourc e spe e ch e stimated 
from proc e ss e d sp ee ch. How e v e r, th e s e obj e ctive t e chniqu e s do not account for phon e tic 
contents, sp e aking styl e s or individual sp e aker diff e r e nc e s. 

25 Accordingly, th e r e e xists a n e ed for ass e ssing sp ee ch quality objectively 

which tak e s into account phon e tic cont e nts, sp e aking styles or individual sp e ak e r 
differ e nc e s. 

Su mmary of th e Inv e nti on 

30 The pres e nt inv e ntion i s a method for obj e ctiv e sp e ech quality assessm e nt 

that accounts for phonetic cont e nts, sp e aking styl e s or individual sp e aker diff e r e nces by 
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distorting spe e ch signals undor speech quality asG e ssment. By using a distorted v e rsion 
of a speech signal, it is possible to comp e nsate for diff e r e nt phonetic cont e nts, diff e r e nt 
individual speak e rs and diff e r e nt speaking styl e s wh e n assessing speech quality. The 
amount of d e gradation in th e objectiv e spe e ch quality ass e ssm e nt by distorting th e 
5 speech signal is maintain e d similarly for diff e r e nt sp ee ch signals, e specially wh e n th e 
amount of distortion of th e distort e d v e rsion of sp ee ch signal is s e vere. Obj e ctiv e sp ee ch 
quality ass e ssm e nt for th e distort e d sp ee ch signal and th e original undistort e d sp ee ch 
signal are compar e d to obtain a sp ee ch quality ass e ssm e nt comp e nsat e d for utt e ranc e 
d e pendent articulation. In one e mbodiment, th e comparison corr e sponds to a differ e nce 
10 b e tw ee n th e obj e ctiv e sp ee ch quality ass e ssm e nts for th e distort e d and undistort e d sp ee ch 
signals. 

Bri ef D e scription of th e Drawings 

The f e atur e s, asp e cts, and advantages of the pr e s e nt inv e ntion will b e come 

15 bett e r understood with r e gard to th e following d e scription, append e d claims, and 
accompanying drawings wher e : 

Fig. 1 depicts an obj e ctiv e sp ee ch quality ass e ssm e nt arrang e m e nt which 

comp e nsates for utt e rance dependent articulation in accordanc e with the pr e s e nt 
inv e ntion; 

20 Fig. 2 depicts an e mbodim e nt of an obj e ctive sp ee ch quality assessment module 

e mploying an auditory articulatory analysis modul e in accordanc e with th e present 
inv e ntion.; 

Fig. 3 depicts a flowchart for proc e ssing, in an articulatory analysis modul e , th e 

plurality of env e lop e s £u(t) in accordanc e with one embodim e nt of th e invention; and 

25 Fig. 4 d e picts an e xampl e illustrating a modulation sp e ctrum Aj(m,f) in t e rm s of 

pow e r versus fr e qu e ncy. 

Det ailed D e scription 

Th e present inv e ntion is a m e thod for objectiv e sp ee ch quality assessm e nt 

30 that accounts for phonetic cont e nts, sp e aking styl e s or individual sp e ak e r differ e nc e s by 
distorting proc e ss e d speech. Obj e ctive sp e ech quality assessment t e nd to yi e ld diff e r e nt 
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values for diff e r e nt sp ee ch signals which have same subj e ctive spe e ch quality scor e s. Th e 
reason th e se valu e s diff e r is b e cause of differ e nt distributions of sp e ctral cont e nts in th e 
modulation sp e ctral domain. By using a distort e d v e rsion of a process e d sp ee ch signal, it 
is possibl e to comp e nsat e for differ e nt phon e tic cont e nts, diff e rent individual speak e rs 
5 and diff e rent sp e aking styles. The amount of d e gradation in th e obj e ctiv e sp e ech quality 
ass e ssm e nt by distorting the speech signal is maintain e d similarly for diff e r e nt sp ee ch 
signals, e specially wh e n th e distortion is sev e r e . Objective sp ee ch quality ass e ssm e nt for 
the distorted sp ee ch s ignal and the original undistorted sp ee ch signal are compared to 
obtain a sp e ech quality ass e ssment comp e nsated for utteranc e d e p e nd e nt articulation. 

10 Fig. 1 d e picts an obj e ctiv e sp ee ch quality ass e ssm e nt arrangem e nt 10 

which comp e nsat e s for utt e ranc e dep e ndent articulation in accordanc e with th e pr e s e nt 
inv e ntion. Obj e ctiv e sp ee ch quality ass e ssm e nt arrang e ment 10 compris e s a plurality of 
obj e ctiv e speech'quality ass e ssment modul e s 12, 1 4 , a distortion modul e 16 and a 
comp e nsation utt e rance sp e cific bias modul e 18. Speech signal s(t) is provid e d as inputs 

15 to distortion module 16 and obj e ctiv e spe e ch quality ass e ssm e nt modul e 12. In distortion 
modul e 16, sp e ech signal s(t) is distort e d to produc e a modulat e d nois e r e f e r e nc e unit 
(MNRU) speech signal s'(t). In oth e r words, distortion modul e 16 produc e s a noisy 
version of input signal s(t). MNRU sp ee ch signal s'(t) is then provided as input to 
obj e ctiv e sp e ech quality ass e ssment modul e 11. 

20 In objectiv e sp ee ch quality ass e ssm e nt modules 12, 1 4 , sp ee ch signal s(t) 

and MNRU speech signal s*(t) ar e proc e ssed to obtain obj e ctive speech quality 
assessm e nts SQ(s(t) and SQ(s'(t)). Obj e ctive sp ee ch quality ass e ssment modul e s 12, 1 4 
are ess e ntially id e ntical in t e rms of th e typ e of proc e ssing p e rformed to any input sp ee ch 
signals. That is, if both obj e ctiv e sp ee ch quality assessment modules 12, 1 4 r e c e iv e th e 

25 sam e input speech signal, th e output signals of both modul e s 12, 1 4 would b e 

approximately identical. Not e that, in other embodim e nts, objective speech quality 
assessm e nt modul e s 12, H may proc e ss sp e ech signals s(t) and s'(t) in a mann e r diff e r e nt 
from each other. Objectiv e sp ee ch quality ass e ssm e nt modules arc well known in th e art. 
An e xample of such a module will b e d e scribed lat e r h e r e in. 

30 Objectiv e sp ee ch quality assessm e nts SQ(s(t) and SQ( s '(t)) ar e then 

compar e d to obtain sp e ech quality ass e ssment SQcomp e nsat e d, which compensates for 
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utteranc e d e pend e nt articulation. In on e e mbodim e nt, spe e ch quality assessment 
ftQcompcn;atcd *s det e rmin e d using the diff e r e nce b e tw ee n obj e ctiv e spe e ch quality 
assessments SQ(s(t) and SQ(s'(t)). For e xample, SQcompcnsatcd is e qual to SQ(s(t) minus 
SQ(s'(t)), or vice versa. In anoth e r e mbodim e nt, sp ee ch quality ass e ssment SQ com p e n S at e d 
5 is det e rmin e d based on a ratio betwe e n obj e ctiv e sp ee ch quality ass e ssm e nts SQ(s(t) and 
SQ(s'(t)). For e xampl e , 

cq _SQ(s(t))+/i SQ(s'(t))+ M 

yCOmpenSated SQ(8'(t))+/i ^compensated gQ^)^ 

wh e r e /x is a small constant valu e . 

As mentioned e arli e r, obj e ctiv e spe e ch quality ass e ssm e nt modules 12, 1 4 

10 are well known in the art. Fig. 2 d e picts an embodim e nt 20 of an objectiv e sp ee ch 

quality assessment modul e 12, 1 4 employing an auditory articulatory analysis module in 
accordanc e with th e pres e nt inv e ntion. — As shown in Fig. 2, objectiv e quality assessm e nt 
modul e 20 compris e s of cochl e ar filt e rbanlc 22, e nvelope analysis modul e 2 4 and 
articulatory analysis modul e 26. In obj e ctive quality ass e ssm e nt modul e 20, spe e ch 
15 signal s(t) is provid e d as input to cochl e ar filt e rbanlc 22. Cochl e ar filt e rbanlc 22 
compris e s a plurality of cochl e ar filt e rs hi(t) for proc e ssing sp ee ch signal s(t) in 
accordanc e with a first stag e of a p e riph e ral auditory syst e m, wh e re i-l,2,...,N e r e pr e s e nts 
a particular cochl e ar filter chann e l and N 6 denot e s the total numb e r of cochl e ar filt e r 
channels. Sp e cifically, cochl e ar filterbanlc 22 filt e rs sp ee ch signal s(t) to produc e a 
20 plurality of critical band signals Si(t), wh e r e in critical band signal Si(t) is e qual to 

The plurality of critical band signals s»(t) is provided as input to e nv e lop e 

analysis modul e 2 4 . In e nv e lop e analysis modul e 2 4 , the plurality of critical band signals 

Si (t) is proc e ssed to obtain a plurality of envelopes a^(t), wher e in a^O-^s^O+Sfft) -and 

25 3j (t) is the Hilb e rt transform of a { (t) r 

Th e plurality of e nvelop e s a+(t) is then provid e d as input to articulatory 

analysis modul e 26. In articulatory analysis modulo 26, the plurality of env e lop e s ai (t)4s 
processed to obtain a sp ee ch quality ass e ssment for spe e ch signal s(t). Sp e cifically, 
articulatory analysis modul e 26 does a comparison of th e pow e r associat e d with signals 

30 g e n e rat e d from th e human articulatory system (hereinaft e r r e f e rr e d to as "articulation 
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pow e r P A (m,i)") with the pow e r associated with signals not g e n e rated from th e human 
articulatory system (h e r e inaft e r r e ferred to as "non articulation pow e r FnaOim)")- Such 
comparison is then us e d to mak e a speech quality ass e ssm e nt. 

Fig. 3 d e picts a flowchart 300 for proc e ssing, in articulatory analysis 

5 modul e 26, th e plurality of e nv e lopes ai(t) in accordanc e with one e mbodim e nt of th e 
invention. In s t e p 310, Fouri e r transform is p e rform e d on fram e m of each of th e 
plurality of e nv e lop e s ai(t) to produce modulation sp e ctrums Ai(m,f), wher e f is 
fr e qu e ncy. 

Fig. 4 d e picts an e xampl e 4 0 illustrating modulation sp e ctrum A^ a^fym 

10 terms of power v e rsus fr e qu e ncy. In e xampl e 4 0, articulation pow e r P A (m,i) is th e pow e r 
associated with fr e qu e ncies 2 - 12.5 Hz, and non articulation pow e r P^m^) is th e pow e r 
associat e d with fr e qu e ncies gr e at e r than 12.5 Hz. Power PMe(m,i) associat e d with 
fr e qu e nci e s l e ss than 2 Hz is th e DC compon e nt of fram e m of critical band signal a ift)? 
In this e xampl e , articulation pow e r P A (m,i) is chos e n as th e power associat e d with 

15 fr e quenci e s 2 - 12.5 Hz bas e d on th e fact that th e sp ee d of human articulation is 2 - 12.5 
Hz, and the fr e qu e ncy rang e s associated with articulation pow e r P A (m,i) and non 
articulation pow e r P^A(m,i) (h e reinaft e r r e f e rr e d to resp e ctiv e ly as "articulation 
frequency rang e " and "non articulation fr e qu e ncy rang e ") are adjac e nt, non ov e rlapping 
frequency rang e s. It should b e und e rstood that, for purpos e s of this application, th e t e rm 

20 "articulation pow e r F A (m,i)" should not be limited to the fr e qu e ncy rang e of human 
articulation or th e afor e m e ntion e d fr e quency range 2 - 12.5 Hz. Likewis e , th e t e rm 
"non articulation pow e r PuaChU)" should not b e limit e d to fr e qu e ncy rang e s gr e at e r than 
th e fr e quency range associat e d with articulation pow e r P A (m,i). The non articulation 
frequency rang e may or may not ov e rlap with or b e adjac e nt to the articulation fr e qu e ncy 

25 rang e . Th e non articulation fr e qu e ncy rang e may also includ e fr e quencies l e ss than th e 
low e st fr e quency in the articulation fr e qu e ncy rang e , such as thos e associat e d with th e 
DC compon e nt of frame m of critical band signal a i(t)r 

In st e p 320, for e ach modulation spectrum Aj(m,f), articulatory analysis 

modul e 26 p e rforms a comparison b e tw ee n articulation pow e r P A (m,i) and non 
30 articulation power P^n^i). In this e mbodiment of articulatory analysis modul e 26, the 
comparison b e tw ee n articulation pow e r P A (m,i) and non articulation pow e r P^m^) is an 
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articulation to non articulation ratio ANR(m,i). Th e ANR is d e fin e d by tho following 
e quation 



ANR(m,i)=i A 



equation (1) 



P NA (m,i)+€ 

wh e r e c is som e small constant valu e . Other comparisons betwe e n articulation pow e r 
P A (ni,i) and non articulation pow e r FNA(m,i) ar e possibl e . For e xampl e , tho comparison 
may b e tho reciprocal of e quation (1), or the comparison may be a differenc e b e tw ee n 
articulation pow e r ?A(m,i) and non articulation pow e r F^nM). For e ase of discussion, 
th e embodiment of articulatory analysis modul e 26 d e picted by flowchart 300 will b e 
discuss e d with r e sp e ct to th e comparison using ANR(m,i) of e quation (1). This should 
not, how e ver, b e construed to limit th e pr e sent inv e ntion in any manner. 

In st e p 330, ANR(m,i) is us e d to d e termine local speech quality LSQ(m) 

for fram e m. Local sp ee ch quality LSQ(m) is d e termin e d using an aggr e gat e of the 
articulation to non articulation ratio ANR(m,i) across all chann e ls i and a w e ighing 
factor R(m,i) bas e d on th e DC compon e nt pow e r F^n^i). Specifically, local s p ee ch 
quality LSQ(m) is determin e d using th e following e quation 





"N c 

^ANR(m,i)R(m,i) 


equation (2) 


LSQ(m)~log 







wh e r e 



R(m,i)= 

2>g(l+P N o(m>k) 



e quation (3) 



and k is a fr e qu e ncy ind e x. 

In step 3 4 0, ov e rall sp e ech quality SQ for sp ee ch signal s(t) is d e t e rmin e d 

using local sp ee ch quality LSQ(m) and a log pow e r F s (m) for fram e m. Sp e cifically, 
speech quality SQ is det e rmin e d using th e following e quation 

x 



£ P S »LSQ» 



m=l 
L p . >p u» 



e quation ( 4 ) 
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, L is Lp norm, T is the total numb e r of fram e s in sp ee ch 



whoro P s (m)-log| ^V(t) 

signal s(t), X is any valu e , and is a thr e shold for distinguishing b e tw ee n audibl e signals 
and sil e nc e . In ono e mbodiment, X is pref e rably an odd int e g e r valu e . 

Tho output of articulatory analysis modul e 26 is an ass e ssm e nt of sp ee ch 

5 quality SQ over all frames m. That is, sp ee ch quality SQ is a sp ee ch quality ass e ssm e nt 
for speech signal s(t). 

Although th e pres e nt inv e ntion has b ee n d e scrib e d in consid e rabl e d e tail 

with refer e nce to certain e mbodim e nts, oth e r v e rsions ar e possibl e . Th e r e for e , th e spirit 
and scop e of the pr e sent inv e ntion s hould not b e limit e d to the d e scription of the 
10 embodim e nts contain e d h e r e in. 
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Claims 
I claim: 

A m e thod of assessing spe e ch quality comprising th e steps of: 

determining a first and s e cond sp ee ch quality ass e ssm e nt for a first and 

s e cond sp ee ch signal, th e first sp e ech signal b e ing a distort e d v e rsion of tho 
s e cond spooch signal; and 

comparing the first and s e cond spe e ch qualiti e s to obtain a compensated 

speech quality assessm e nt. 

Or, Th e method of claim 1 comprising th e additional st e ps of 

prior to determining the first and second spe e ch quality ass e ssm e nts, 

distorting tho second sp ee ch signal to produc e th e first sp ee ch signal. 

^ Th e m e thod of claim 1, wh e r e in th e first and s e cond sp ee ch qualiti e s ar e ass e ss e d 

using an id e ntical t e chniqu e for objectiv e sp ee ch quality ass e ssm e nt. 

Az The m e thod of claim 1, wh e r e in th e comp e nsat e d sp ee ch quality ass e ssm e nt 

corr e sponds to a diff e r e nc e b e tw ee n th e first and s e cond sp ee ch qualiti e s. 

Th e m e thod of claim 1, wh e rein th e comp e nsat e d sp e ech quality ass e ssm e nt 

corr e sponds to a ratio b e tw ee n th e first and s e cond sp ee ch qualiti e s. 

& Th e m e thod of claim 1, wh e r e in th e first and second sp ee ch qualiti e s ar e ass e ss e d 

using auditory articulatory analysis. 

3-. Th e m e thod of claim 1, wh e r e in th e st e p ass e ssing th e s e cond or first sp ee ch 

quality compris e s th e st e ps of; 

comparing articulation pow e r and non articulation pow e r for tho sp e ech 

signal or distort e d sp ee ch signal, wh e r e in articulation and non articulation powers 
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are pow e rs associat e d with articulation and non articulation fr e quencies of th e 
spooch signal or distort e d sp ee ch signal; and 

and ass e ssing th e s e cond or first spe e ch quality bas e d on th e comparison. 

& The m e thod of claim 7, wh e r e in th e articulation fr e qu e nci e s ar e approximat e ly 

2 - 12.5 Hz. 

9-. Th e method of claim 7, wherein th e articulation fr e quenci e s correspond 

approximat e ly to a sp e ed of human articulation. 

4& Th e method of claim 7, wh e r e in th e non articulation fr e quenci e s ar e 

approximat e ly gr e ater than th e articulation fr e qu e nci e s. 

The m e thod of claim 7, wherein th e comparison b e tw ee n th e articulation pow e r 

and non articulation power is a ratio b e tw ee n th e articulation power and non 
articulation power. 

¥2r. Th e m e thod of claim 1 0, wherein th e ratio includ e s a d e nominator and num e rator, 

the num e rator including th e articulation pow e r and a small constant, th e 
denominator including th e non articulation pow e r plus th e small constant. 

Th e m e thod of claim 7, wh e r e in th e comparison b e tw ee n th e articulation pow e r 

and non articulation pow e r is a diff e r e nc e b e tw ee n th e articulation powor and non 
articulation pow e r. 

±4: Th e m e thod of claim 7, wher e in the st e p of ass e ssing th e first or s e cond sp e ech 

quality includ e s th e st e p of: 

determining a local sp ee ch quality using th e comparison. 

i&, Th e m e thod of claim 7, wh e r e in th e local spe e ch quality is furth e r d e t e rmin e d 

using a w e ighing factor based on a DC compon e nt pow e r. 
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4& The m e thod of claim 9, wh e rein th e first or s e cond Gpooch quality is d e termin e d 

using the local sp ee ch quality. 

5 The method of claim 7, wh e rein th e st e p of comparing articulation power and non 

articulation pow e r includ e s the st e p of: 

p e rforming a Fouri e r transform on e ach of a plurality of e nv e lop e s 

obtained from a plurality of critical band signals. 

10 -t& Th e m e thod of claim 7, wh e r e in th e st e p of comparing articulation pow e r and non 

articulation power includ e s th e st e p of: 

filt e ring the sp ee ch signal to obtain a plurality of critical band signals. 

4-9: Th e m e thod of claim 18, wh e r e in th e step of comparing articulation pow e r and 

15 non articulation pow e r includ e s th e st e p of: 

p e rforming an e nv e lop e analysis on th e plurality of critical band signals to 

obtain a plurality of modulation sp e ctrums. 

2& The m e thod of claim 18, wh e r e in th e step of comparing articulation pow e r and 

20 non articulation pow e r includ e s the st e p of: 

p e rforming a Fouri e r transform on e ach of th e plurality of modulation 

sp e ctrums. 
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w Abstract of th e Disclosur e 

A m e thod for obj e ctive sp e ech quality ass e s s m e nt that accounts for 

phonetic contents, sp e aking styl e s or individual sp e aker diff e renc e s by distorting sp ee ch 
signals under sp ee ch quality ass e ssment. By using a distort e d v e rsion of a spe e ch signal, 
5 it is possibl e to comp e nsate for diff e r e nt phon e tic cont e nts, diff e r e nt individual sp e ak e rs 
and diff e r e nt sp e aking styles wh e n assessing sp ee ch quality. Th e amount of d e gradation 
in th e obj e ctiv e sp ee ch quality ass e ssm e nt by distorting the sp ee ch signal is maintained 
similarly for diff e r e nt sp ee ch signals, esp e cially wh e n th e amount of distortion of th e 
distorted version of sp ee ch signal is s e v e r e . Obj e ctiv e spe e ch quality ass e ssm e nt for th e 
10 distort e d speech signal and the original undistortod sp ee ch signal ar e compared to obtain 
a speech quality ass e ssment comp e nsat e d for utt e ranc e d e p e ndent articulation. 
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